Skip to content

test: A2A ITK interoperability harness + baseline (no SDK changes)#35

Open
zeroasterisk wants to merge 1 commit into
actioncard:mainfrom
zeroasterisk:itk-harness-showcase
Open

test: A2A ITK interoperability harness + baseline (no SDK changes)#35
zeroasterisk wants to merge 1 commit into
actioncard:mainfrom
zeroasterisk:itk-harness-showcase

Conversation

@zeroasterisk
Copy link
Copy Markdown

ITK Interoperability Harness + Baseline (showcase — no SDK changes)

This PR adds a test-harness that drives the unmodified A2A Elixir SDK against the official A2A Interoperability Test Kit (ITK), plus an honest capability/gap baseline report. It's a measuring stick, not a fix — zero changes to lib/.

Why

Establish a reproducible baseline so future v1.0-compliance work is gap-driven and regression-gated, and so reviewers can see exactly what the SDK can/can't do against a real A2A client today.

What's added (harness + docs only)

  • test/support/itk/instruction.ex — ITK Instruction protobuf codec
  • test/support/itk/agent.ex — JSON-RPC handler / instruction interpreter
  • test/itk/server.exs — standalone Bandit server (v0.3 card, JSON-RPC, SSE)
  • test/itk/{instruction,agent}_test.exs + binary fixtures
  • docs/ITK_BASELINE.md — full capability/gap report with reproducible evidence

git diff --stat origin/main -- lib/ is empty (verified).

Baseline summary

✅ Works (404 unit tests green against pristine SDK):

  • v0.3-shaped agent card (preferredTransport: JSONRPC, protocolVersion: 0.3.0)
  • ITK Instruction proto decode (return_response, steps, call_agent)
  • Interpreter + Task construction; non-streaming message/send round-trip

❌ Gaps (vs a2a-sdk 0.3.24 Python client — documented, not fixed here):

  • JSON-RPC enum encoding: SDK emits ROLE_AGENT / TASK_STATE_COMPLETED (proto-style); v0.3 client expects agent / completed
  • SSE streaming event envelopes: Task snapshot vs the TaskStatusUpdateEvent / TaskArtifactUpdateEvent union (missing taskId/kind)
  • Card shape: SDK's native encode_agent_card emits v1.0 supportedInterfaces; v0.3 client wants preferredTransport/additionalInterfaces

Prioritized v1.0 gap list (seeds future, separate PRs)

  1. JSON-RPC enum encoding (highest leverage — blocks every traversal)
  2. Streaming event envelopes
  3. v0.3 agent-card emission
  4. gRPC / REST transports (deferred)

Reproduce

mix test                                              # 404 + 2 doctests, 0 failures
MIX_ENV=test mix run test/itk/server.exs --httpPort 10130
# + ITK driver: uv run --no-sources python itk_baseline_elixir.py

Note: the standalone server must run under MIX_ENV=test (harness modules live in test/support/, only on elixirc_paths in test env).

Adds a test-harness that drives the unmodified A2A Elixir SDK against the
official A2A Interoperability Test Kit (ITK), plus an honest capability/gap
baseline report. This is a measuring-stick PR — no changes to lib/.

Harness:
- test/support/itk/instruction.ex  ITK Instruction protobuf codec
- test/support/itk/agent.ex         JSON-RPC handler / instruction interpreter
- test/itk/server.exs               standalone Bandit server (v0.3 card, JSON-RPC, SSE)
- test/itk/{instruction,agent}_test.exs + fixtures

Report (docs/ITK_BASELINE.md) documents, with reproducible evidence:
- WORKS: v0.3 agent card, proto decode, interpreter, non-streaming message/send (404 tests green)
- GAPS:  JSON-RPC enum encoding (ROLE_AGENT/TASK_STATE_* vs v0.3 agent/completed),
         SSE streaming event envelopes, v1.0 vs v0.3 card shape
- Prioritized v1.0 gap list seeding future (separate) SDK work

Reference client: a2a-sdk 0.3.24.
@zeroasterisk
Copy link
Copy Markdown
Author

The plan is: Land this PR with dustin for v0.3 and 1.0. then upgrade.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

TCK 1.0-dev Compatibility Results (experimental)

This run is informational — failures do not block CI.

             A2A TCK Compatibility Report              
═══════════════════════════════════════════════════════
SUT: http://localhost:9999
Timestamp: 2026-06-01T09:59:11.645751+00:00

OVERALL COMPATIBILITY: 44.8%

┌─────────────┬────────┬────────┬─────────┬───────┐
│ Level       │ Passed │ Failed │ Skipped │ Total │
├─────────────┼────────┼────────┼─────────┼───────┤
│ MUST        │     26 │     53 │      35 │   114 │
│ SHOULD      │      2 │      9 │       0 │    11 │
│ MAY         │      2 │      2 │       0 │     4 │
└─────────────┴────────┴────────┴─────────┴───────┘

BY TRANSPORT:
  agent_card:    8/10 ⚠
  grpc:          0/72 (72 skipped) ✓
  jsonrpc:       28/99 (30 skipped) ⚠
  http_json:     3/83 (80 skipped) ✓

FAILED REQUIREMENTS:
  ✗ CARD-CACHE-002 (agent_card): Agent Card response should include an ETag header
  ✗ CARD-CACHE-003 (agent_card): Agent Card response may include a Last-Modified header
  ✗ DM-ART-001 (jsonrpc): Response contains no artifacts
  ✗ DM-MSG-001 (jsonrpc): Expected a Message response, but got a Task or no payload
  ✗ DM-TASK-001 (jsonrpc): $.task: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ DM-TASK-002 (jsonrpc): $.task: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ DM-MSG-002 (jsonrpc): $.task: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ DM-PART-001 (jsonrpc): $.task: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ DM-STATUS-001 (jsonrpc): $.task: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ DM-SERIAL-004 (jsonrpc): $.task: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ VER-SERVER-002 (jsonrpc): Expected VersionNotSupportedError for A2A-Version: 99.0
  ✗ JSONRPC-SSE-002 (): Error code mismatch: expected ContentTypeNotSupportedError (-32005), got ParseError (-32700)
  ✗ JSONRPC-ERR-003 (): error.data is absent — A2A errors MUST include ErrorInfo in data array
  ✗ CORE-SEND-001 (jsonrpc): $.task: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ CORE-SEND-003 (jsonrpc): Operation failed: Invalid parameters
  ✗ CORE-LIST-001 (jsonrpc): $.tasks[0]: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ CORE-LIST-002 (jsonrpc): $.tasks[0]: 'kind' does not match any of the regexes: '^(context_id)$'
  ✗ CORE-LIST-003 (jsonrpc): $.tasks[0]: 'kind' does not match any of the regexes: '^(context_id)$'

@maxekman
Copy link
Copy Markdown
Contributor

maxekman commented Jun 1, 2026

🤖 AI-generated comment on behalf of @maxekman — reviewed but not hand-written.

Now that #36 (v1.0 wire format, PR 1 of #13) has landed on main, this harness will see a different set of "gaps" against a2a-sdk 0.3.24 — flat Part and the dropped kind discriminator add to the existing ROLE_* / TASK_STATE_* divergence. The harness itself is unaffected (no lib/ changes), but docs/ITK_BASELINE.md will need a rebase + rerun.

Two things worth aligning on before this lands:

  1. Framing. A2A v1.0 Protocol Support #13's stated direction is "emit v1.0, accept v0.3 on decode." Under that plan, the deltas this doc lists aren't gaps to fix — they're known v0.3-client divergences during v1.0 migration. Could the report be reframed that way? Otherwise the doc and the roadmap point in opposite directions and whoever picks up PR 2 will get whipsawed.

  2. Independent value. The ITK harness itself (Instruction proto codec + interpreter + standalone server) is useful regardless of which direction the wire format moves — it's a real-client interop lens that the TCK doesn't give us. Happy to land it on that basis.

Concretely: rerun against current main, update the baseline numbers, and reword the "Prioritized v1.0 gap list" as "v0.3 client compat notes" (or similar). After that I think this is good to merge.

@maxekman
Copy link
Copy Markdown
Contributor

maxekman commented Jun 1, 2026

Also note that it's always good to run mix format before pushing, and mix quality to match what CI gates on. Both are also documented in AGENTS.md.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants